feat(ARSN-572): Propagate W3C trace context to MongoDB metadata writes#2611
feat(ARSN-572): Propagate W3C trace context to MongoDB metadata writes#2611delthas wants to merge 1 commit into
Conversation
Hello delthas,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Request integration branchesWaiting for integration branch creation to be requested by the user. To request integration branches, please comment on this pull request with the following command: Alternatively, the |
d42f30b to
01693cd
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## development/8.3 #2611 +/- ##
===================================================
+ Coverage 73.48% 73.54% +0.05%
===================================================
Files 222 223 +1
Lines 18183 18215 +32
Branches 3762 3767 +5
===================================================
+ Hits 13362 13396 +34
+ Misses 4816 4814 -2
Partials 5 5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
01693cd to
ca456c9
Compare
ca456c9 to
a30bce8
Compare
a30bce8 to
39233c8
Compare
39233c8 to
a1fe7ca
Compare
a1fe7ca to
eb5a535
Compare
| traceparent?: string; | ||
| tracestate?: string; |
There was a problem hiding this comment.
| traceparent?: string; | |
| tracestate?: string; | |
| parentId?: string; | |
| state?: string; |
State is a string or a list of possible value ?
There was a problem hiding this comment.
Those names are standard, come from a W3C spec: https://www.w3.org/TR/trace-context/
There was a problem hiding this comment.
@delthas indeed, I saw that we use them in several place in the code but when stored in mongo, under the traceContext structure, it don't make sense anymore to keep trace
There was a problem hiding this comment.
I'd heavily lean towards keeping the standard names, as they have specific formats, with meanings specified in the specification. Specifically, traceparent is not a plain parentId at all, it is a carefull formatted string, such as traceparent: 00-0af7651916cd43dd8448eb211c80319c-b9c7c989f97918e1-01, which carries a version, a trace ID, a span ID, and flags, including the standard "is sampled" flag. parentId would be outright wrong, but even if we gave it a better name, it would be less precise and more confusing than traceparent, which is exactly the format the consumer would use. For example, see my backbeat code that recreates an OTEL span from saved trace context: tracecontext and traceparent are the field names used by the library.
There was a problem hiding this comment.
If you prefer to keep it, then let's go. It still don't make sense for me but I'm fine to move forward
eb5a535 to
7c5ecce
Compare
7c5ecce to
28bca79
Compare
Add a traceContext field to ObjectMD carrying the currently-active OTEL trace context (W3C traceparent/tracestate). Inject it automatically at the three metadata write chokepoints where originOp is set today: internalPutObject, repair, and internalDeleteObject. The value ends up in the MongoDB oplog; downstream consumers (backbeat, sorbet) can extract it to continue the trace across the async boundary, closing the loop on end-to-end tracing for flows that cross the oplog. When OTEL is not active on the caller (no SDK, or request outside a traced context), captureCurrentTraceContext returns undefined, setTraceContext no-ops, and the field stays absent — zero cost. Only adds @opentelemetry/api as a runtime dependency (the API-only surface package, becomes a no-op when no SDK is registered). Issue: ARSN-572
28bca79 to
8619151
Compare
|
LGTM |
| * @param tc - W3C trace context carrier | ||
| * @return itself | ||
| */ | ||
| setTraceContext(tc?: TraceContextCarrier) { |
There was a problem hiding this comment.
I wonder if this is correct.
- By relying on a call to
setTraceContext, some (bad) code may unexpectedly update the metadata without updating the trace context : and we would end up with incorrect trace... (This has happened before with changes being made without callingsetOriginOp()) - Especially, updating metadata (as in "manipulating the ObjectMD") may be done by backbeat, but the actual write request is made by Cloudserver : so the traceContext should be the one at the bottom (in cloudserver) - and will however "derive" from the traceContext in backbeat
- It may thus be safer to handle this at lower level, like in MetadataWrapper (or even below, in specific backends? would allow correlating more details, like metadata or mongo inner details, but requires to duplicate the code and may break layering/abstraction...)
In addition (and maybe this becomes irrelevant with the above point, or maybe not): I know ObjectMD is mostly written with simple accessors, but this is causing issues. Would be better to allow passing the trace context at the proper point in the lifecycle of the object, and introduce the parameter at that specific point. For exemple, if trace should be passed whenever we make a new MD op, this should be a second parameter of setOriginOp()
There was a problem hiding this comment.
ok, looking below it is applied in mongo client : so correct indeed, but then the problem is more related to abstraction. Setting the trace context SHOULD NOT be done by "users" of ObjectMD (cloudserver and backbeat mostly), but only by metadata backend
→ so we should not have this method.
also it will not be present for other backends -esp. metadata- : should we consider handling this in MetadataWrapper instead of the actual backend? If we stick with current approach (in mongoclientinterface) there should be a ticket to do the same in metadata (and thus probably export applyTraceContext, as it will be used there)
| cb: ArsenalCallback<string | void>, | ||
| ): void { | ||
| MongoUtils.serialize(objVal); | ||
| applyTraceContext(objVal, captureCurrentTraceContext()); |
There was a problem hiding this comment.
since we always serialize objet before writing it, should we not apply traceContext in serialize ?
| * Generic over `data` so callers can use either an ObjectMD's `_data` | ||
| * or a raw `ObjectMDData` without type gymnastics. | ||
| */ | ||
| export function applyTraceContext( |
There was a problem hiding this comment.
this function is always called with captureCurrentTraceContext() in argument: should we not merge them?
| objMetadata.setOriginOp(params.originOp); | ||
| objMetadata.setTraceContext(captureCurrentTraceContext()); | ||
| objMetadata.setDeleted(true); | ||
| return next(null, objMetadata.getValue()); |
There was a problem hiding this comment.
- serialize() not called here, contrary to the other cases : not sure if this can create issue
- the whole parsing/serialization for adding 3 fields is probably overkill, when we could simply set the 3 fields manually (
obj.originOp = params.originOp ; obj.deleted = true ; ...) - we may even be able to use an aggregation pipeline to do this service side and possibly aomatically
not related to this PR, let's not fix it yet ; but please create a ticket to look at this
Context
Cloudserver recently gained OpenTelemetry tracing (CLDSRV-884, cloudserver PR scality/cloudserver#6140). That gives us a single end-to-end trace per S3 request spanning cloudserver → vault → hdproxy → hyperiod.
But many state mutations in the S3 stack are written to MongoDB and consumed asynchronously by worker services via the oplog:
Today the oplog entry carries no trace context, so async work performed hours or days after the S3 request cannot be tied back to the request that caused it. A user's
POST …?restoreproduces a trace that ends at the metadata write; the actual cold-tier retrieval by Sorbet is invisible from the original trace.Solution
Stamp a W3C trace context (
traceparent/tracestate) on every mutating metadata write, auto-derived from the currently-active OTEL context.The field lives on
ObjectMDDatanext tooriginOp:A new helper
captureCurrentTraceContext()reads the active OTEL context via@opentelemetry/apiand serializes it viapropagation.inject(). It's invoked at the threeMongoClientInterfacesites that currently setoriginOp:internalPutObjectrepairinternalDeleteObjectConsumers (backbeat, sorbet — out of scope for this PR) will later:
value.traceContext.traceparentfrom the oplog entrypropagation.extract()it to reconstruct a parent contextcontext.with(extractedCtx, ...)This closes the loop on end-to-end tracing across the async boundary.
Design decisions
1. Auto-inject at the arsenal write boundary (vs. param plumbing).
The alternative was to thread a new
traceContextparameter through the ~15 cloudserver call sites that already setoriginOp. Rejected because:@opentelemetry/context-async-hooks(loaded by the OTEL SDK at cloudserver startup) already propagates the request context through arsenal's async call chain for free. Re-reading it at the mongo write is idiomatic OTEL.originOpout of params and attaches it to theObjectMD.originOpeasy to forget on new API endpoints today.2. Only
@opentelemetry/apias a runtime dep (~10 KB, no SDK).The api package is specifically designed to be safe for libraries to depend on. When no SDK is registered (today's state for anyone not opting into cloudserver OTEL),
context.active()returns a no-op context,trace.getSpan()returns undefined, andcaptureCurrentTraceContext()returnsundefined.setTraceContext(undefined)no-ops; the field stays absent in the write. Zero runtime cost when OTEL is off.3. Optional field, no schema migration.
traceContextis absent by default — consistent with other optional ObjectMD fields likelegalHold,retentionDate. No default-value change in the ObjectMDData construction. Existing consumers that don't understand the field simply ignore it; fully backward compatible.4. No-op when no traceparent is present.
setTraceContext(undefined)andsetTraceContext({ tracestate: 'x' })(no traceparent) both no-op — the field only appears when the caller had an active trace. This prevents junk entries from any edge cases in the OTEL API.Interaction with async flows (cold restore as example)
Cold storage restore exercises both a sync and an async metadata write:
ObjectRestore:Post(sync, user-initiated viaPOST /bucket/key?restore) — cloudserver setsoriginOp; arsenal now also stampstraceContextpointing to the user's request trace. ✅ObjectRestore:Completed(async, Sorbet-triggered via cloudserver's/_/backbeat/*route) — behavior depends on Sorbet:traceparenton its HTTP call (ideally extracted from step 1's storedtraceContext): arsenal picks it up viacontext.with(remoteContext, ...)in cloudserver'sserver.js, and the Completed write ties back to the original user restore. ✅traceparent(today's state): safely no-ops — Completed write has notraceContext, no regression. ✅Same pattern applies to lifecycle expiration/transition, CRR replication, and GC.
Verification
Unit tests
tests/unit/models/ObjectMD.spec.js— 4 new tests covering:set/getTraceContextsetTraceContext(undefined)is a no-opsetTraceContext({ tracestate: ... })without traceparent is a no-opgetValue()serializestraceContextonly when setAll 128 ObjectMD tests pass (was 124 + 4 new).
End-to-end (follow-up on a test cluster)
ENABLE_OTEL=trueo.value.traceContext.traceparentshould be present, with characters 3-35 matching the Jaeger trace IDENABLE_OTEL=false→ field should be absentIssue: ARSN-572